Introduction and Data Summary

This report is an analysis of three datasets which have been selected for practicing and demonstrating some methods of Data Visualization. The first dataset used in this analysis is a compilation of housing prices from West Roxbury which includes supporting features such as number of rooms, number of floors, total square footage, and several other similar attributes. Using multiple linear regression, a model predicting the total value of a house is fit to the dataset and the resulting coefficients and their estimates are presented. The second set of data consists of two datasets pertaining to lakes in Florida. One dataset contains the mapping schemas of all Florida lakes while the other dataset contains measures of water quality for a subset of Florida lakes. Using these datasets, the water quality parameters of several Polk County lakes are visualized.

Methods

West Roxbury Housing Prices

The West Roxbury housing price dataset contains 14 total attributes covering many core features of homes such as number of rooms, bedrooms, kitchens, floors, and some others. Before regression is performed, some data pre-processing is performed such as converting many of the categorical variables to factors and adjusting some of the names of variables to better fit R’s syntax. An initial model is fit using all of the available variables. This initial model suggests that the number of rooms and bedrooms may not be as significant as the other included features in the model, so those two variables were removed and a second model was fit. In this second model, almost all variables appear significant except for some of the highest factor levels. For example having one, two, or three fireplaces has a significant influence on total price, but having a fourth fireplace seems to be insignificant. The same pattern appears in the case of having a third floor or a third half bathroom.

Florida Lakes

The Florida Lakes dataset contains several measures of water quality pertaining to lakes such as pH, alkalinity, calcium, chlorophyll, and others. These features can all be examined to evaluate the general health of a lake. The other Florida Lakes dataset contains the shapefiles of the lakes including features such as total area, perimeter, and the residing county. A challenge that came up while working with these datasets was matching the lakes from the dataset to the lakes from the shapefiles. The only variable that could be as a key for joining was the name of the lake, where both datasets used a slightly different naming convention. Additionally, only the shapefile lakes included which county the lakes were from, and there are many lakes that share the same name but reside in different counties. In this case, the author elected to focus only on Polk County lakes, and simply try to match the names.

Visualizations

West Roxbury

Housing Price Coefficients

Each of the coefficients of the final model fit are presented here along with their associated metrics. Coefficients with a p-value less than 0.05 are considered to be significant.

Housing Price Coefficients

This visualization presents each of the coefficients along with their estimates. Estimates that appear on the left of the ‘zero’ line represent having a negative effect on house prices, while estimates appearing on the right represent having an increasing effect on house prices.

Polk County Lakes

Polk County Lakes

Polk County Lakes are presented above visualizing average Mercury, pH, chlorophyll, and calcium. Lakes with a pH between ~8 and 9 appear to have less mercury and more chlorophyll present than lakes with lower pH.

Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout

The polk county lakes that had water quality data available are presented above. Lake Parker, being the largest lake here, appears to have a lot more algae present than in the other presented lakes.

Conclusions

Both of the datasets used here make for great demonstrations of visualizing multiple linear regression, interactive plots, and spatial visualizations. Other ideas that were considered include performing multiple linear regression on the lakes dataset to fit a model predicting chlorophyll, but this was left out in favor of performing regression on the housing dataset. Future iterations of this work should consider revisiting the join performed on the lake datasets, as it is likely that an error occurred resulting in the possibility of attributes being matched to a lake that shared the same name, but may not actually be from that county. If another dataset was found that included water quality metrics and the associated county where the lake is from, a more accurate analysis could be performed.

LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBkZl9wcmludDogcGFnZWQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KIyBJbnRyb2R1Y3Rpb24gYW5kIERhdGEgU3VtbWFyeQ0KDQogIFRoaXMgcmVwb3J0IGlzIGFuIGFuYWx5c2lzIG9mIHRocmVlIGRhdGFzZXRzIHdoaWNoIGhhdmUgYmVlbiBzZWxlY3RlZCBmb3IgcHJhY3RpY2luZyBhbmQgZGVtb25zdHJhdGluZyBzb21lIG1ldGhvZHMgb2YgRGF0YSBWaXN1YWxpemF0aW9uLiBUaGUgZmlyc3QgZGF0YXNldCB1c2VkIGluIHRoaXMgYW5hbHlzaXMgaXMgYSBjb21waWxhdGlvbiBvZiBob3VzaW5nIHByaWNlcyBmcm9tIFdlc3QgUm94YnVyeSB3aGljaCBpbmNsdWRlcyBzdXBwb3J0aW5nIGZlYXR1cmVzIHN1Y2ggYXMgbnVtYmVyIG9mIHJvb21zLCBudW1iZXIgb2YgZmxvb3JzLCB0b3RhbCBzcXVhcmUgZm9vdGFnZSwgYW5kIHNldmVyYWwgb3RoZXIgc2ltaWxhciBhdHRyaWJ1dGVzLiBVc2luZyBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiwgYSBtb2RlbCBwcmVkaWN0aW5nIHRoZSB0b3RhbCB2YWx1ZSBvZiBhIGhvdXNlIGlzIGZpdCB0byB0aGUgZGF0YXNldCBhbmQgdGhlIHJlc3VsdGluZyBjb2VmZmljaWVudHMgYW5kIHRoZWlyIGVzdGltYXRlcyBhcmUgcHJlc2VudGVkLiANCiAgVGhlIHNlY29uZCBzZXQgb2YgZGF0YSBjb25zaXN0cyBvZiB0d28gZGF0YXNldHMgcGVydGFpbmluZyB0byBsYWtlcyBpbiBGbG9yaWRhLiBPbmUgZGF0YXNldCBjb250YWlucyB0aGUgbWFwcGluZyBzY2hlbWFzIG9mIGFsbCBGbG9yaWRhIGxha2VzIHdoaWxlIHRoZSBvdGhlciBkYXRhc2V0IGNvbnRhaW5zIG1lYXN1cmVzIG9mIHdhdGVyIHF1YWxpdHkgZm9yIGEgc3Vic2V0IG9mIEZsb3JpZGEgbGFrZXMuIFVzaW5nIHRoZXNlIGRhdGFzZXRzLCB0aGUgd2F0ZXIgcXVhbGl0eSBwYXJhbWV0ZXJzIG9mIHNldmVyYWwgUG9sayBDb3VudHkgbGFrZXMgYXJlIHZpc3VhbGl6ZWQuIA0KICANCiMgTWV0aG9kcw0KDQojIyBXZXN0IFJveGJ1cnkgSG91c2luZyBQcmljZXMNCg0KICBUaGUgV2VzdCBSb3hidXJ5IGhvdXNpbmcgcHJpY2UgZGF0YXNldCBjb250YWlucyAxNCB0b3RhbCBhdHRyaWJ1dGVzIGNvdmVyaW5nIG1hbnkgY29yZSBmZWF0dXJlcyBvZiBob21lcyBzdWNoIGFzIG51bWJlciBvZiByb29tcywgYmVkcm9vbXMsIGtpdGNoZW5zLCBmbG9vcnMsIGFuZCBzb21lIG90aGVycy4gQmVmb3JlIHJlZ3Jlc3Npb24gaXMgcGVyZm9ybWVkLCBzb21lIGRhdGEgcHJlLXByb2Nlc3NpbmcgaXMgcGVyZm9ybWVkIHN1Y2ggYXMgY29udmVydGluZyBtYW55IG9mIHRoZSBjYXRlZ29yaWNhbCB2YXJpYWJsZXMgdG8gZmFjdG9ycyBhbmQgYWRqdXN0aW5nIHNvbWUgb2YgdGhlIG5hbWVzIG9mIHZhcmlhYmxlcyB0byBiZXR0ZXIgZml0IFIncyBzeW50YXguIEFuIGluaXRpYWwgbW9kZWwgaXMgZml0IHVzaW5nIGFsbCBvZiB0aGUgYXZhaWxhYmxlIHZhcmlhYmxlcy4gVGhpcyBpbml0aWFsIG1vZGVsIHN1Z2dlc3RzIHRoYXQgdGhlIG51bWJlciBvZiByb29tcyBhbmQgYmVkcm9vbXMgbWF5IG5vdCBiZSBhcyBzaWduaWZpY2FudCBhcyB0aGUgb3RoZXIgaW5jbHVkZWQgZmVhdHVyZXMgaW4gdGhlIG1vZGVsLCBzbyB0aG9zZSB0d28gdmFyaWFibGVzIHdlcmUgcmVtb3ZlZCBhbmQgYSBzZWNvbmQgbW9kZWwgd2FzIGZpdC4gSW4gdGhpcyBzZWNvbmQgbW9kZWwsIGFsbW9zdCBhbGwgdmFyaWFibGVzIGFwcGVhciBzaWduaWZpY2FudCBleGNlcHQgZm9yIHNvbWUgb2YgdGhlIGhpZ2hlc3QgZmFjdG9yIGxldmVscy4gRm9yIGV4YW1wbGUgaGF2aW5nIG9uZSwgdHdvLCBvciB0aHJlZSBmaXJlcGxhY2VzIGhhcyBhIHNpZ25pZmljYW50IGluZmx1ZW5jZSBvbiB0b3RhbCBwcmljZSwgYnV0IGhhdmluZyBhIGZvdXJ0aCBmaXJlcGxhY2Ugc2VlbXMgdG8gYmUgaW5zaWduaWZpY2FudC4gVGhlIHNhbWUgcGF0dGVybiBhcHBlYXJzIGluIHRoZSBjYXNlIG9mIGhhdmluZyBhIHRoaXJkIGZsb29yIG9yIGEgdGhpcmQgaGFsZiBiYXRocm9vbS4gDQogIA0KIyMgRmxvcmlkYSBMYWtlcyANCg0KICBUaGUgRmxvcmlkYSBMYWtlcyBkYXRhc2V0IGNvbnRhaW5zIHNldmVyYWwgbWVhc3VyZXMgb2Ygd2F0ZXIgcXVhbGl0eSBwZXJ0YWluaW5nIHRvIGxha2VzIHN1Y2ggYXMgcEgsIGFsa2FsaW5pdHksIGNhbGNpdW0sIGNobG9yb3BoeWxsLCBhbmQgb3RoZXJzLiBUaGVzZSBmZWF0dXJlcyBjYW4gYWxsIGJlIGV4YW1pbmVkIHRvIGV2YWx1YXRlIHRoZSBnZW5lcmFsIGhlYWx0aCBvZiBhIGxha2UuIFRoZSBvdGhlciBGbG9yaWRhIExha2VzIGRhdGFzZXQgY29udGFpbnMgdGhlIHNoYXBlZmlsZXMgb2YgdGhlIGxha2VzIGluY2x1ZGluZyBmZWF0dXJlcyBzdWNoIGFzIHRvdGFsIGFyZWEsIHBlcmltZXRlciwgYW5kIHRoZSByZXNpZGluZyBjb3VudHkuIA0KICBBIGNoYWxsZW5nZSB0aGF0IGNhbWUgdXAgd2hpbGUgd29ya2luZyB3aXRoIHRoZXNlIGRhdGFzZXRzIHdhcyBtYXRjaGluZyB0aGUgbGFrZXMgZnJvbSB0aGUgZGF0YXNldCB0byB0aGUgbGFrZXMgZnJvbSB0aGUgc2hhcGVmaWxlcy4gVGhlIG9ubHkgdmFyaWFibGUgdGhhdCBjb3VsZCBiZSBhcyBhIGtleSBmb3Igam9pbmluZyB3YXMgdGhlIG5hbWUgb2YgdGhlIGxha2UsIHdoZXJlIGJvdGggZGF0YXNldHMgdXNlZCBhIHNsaWdodGx5IGRpZmZlcmVudCBuYW1pbmcgY29udmVudGlvbi4gQWRkaXRpb25hbGx5LCBvbmx5IHRoZSBzaGFwZWZpbGUgbGFrZXMgaW5jbHVkZWQgd2hpY2ggY291bnR5IHRoZSBsYWtlcyB3ZXJlIGZyb20sIGFuZCB0aGVyZSBhcmUgbWFueSBsYWtlcyB0aGF0IHNoYXJlIHRoZSBzYW1lIG5hbWUgYnV0IHJlc2lkZSBpbiBkaWZmZXJlbnQgY291bnRpZXMuIEluIHRoaXMgY2FzZSwgdGhlIGF1dGhvciBlbGVjdGVkIHRvIGZvY3VzIG9ubHkgb24gUG9sayBDb3VudHkgbGFrZXMsIGFuZCBzaW1wbHkgdHJ5IHRvIG1hdGNoIHRoZSBuYW1lcy4gDQoNCiMgVmlzdWFsaXphdGlvbnMNCg0KIyMgV2VzdCBSb3hidXJ5DQoNCiFbSG91c2luZyBQcmljZSBDb2VmZmljaWVudHNdKC4uL2ltYWdlcy9ob3VzZV90YWJsZS5wbmcpDQoNCkVhY2ggb2YgdGhlIGNvZWZmaWNpZW50cyBvZiB0aGUgZmluYWwgbW9kZWwgZml0IGFyZSBwcmVzZW50ZWQgaGVyZSBhbG9uZyB3aXRoIHRoZWlyIGFzc29jaWF0ZWQgbWV0cmljcy4gQ29lZmZpY2llbnRzIHdpdGggYSBwLXZhbHVlIGxlc3MgdGhhbiAwLjA1IGFyZSBjb25zaWRlcmVkIHRvIGJlIHNpZ25pZmljYW50LiANCg0KIVtIb3VzaW5nIFByaWNlIENvZWZmaWNpZW50c10oLi4vaW1hZ2VzL2hvdXNlX2NvZWZzLnN2ZykNCg0KVGhpcyB2aXN1YWxpemF0aW9uIHByZXNlbnRzIGVhY2ggb2YgdGhlIGNvZWZmaWNpZW50cyBhbG9uZyB3aXRoIHRoZWlyIGVzdGltYXRlcy4gRXN0aW1hdGVzIHRoYXQgYXBwZWFyIG9uIHRoZSBsZWZ0IG9mIHRoZSAnemVybycgbGluZSByZXByZXNlbnQgaGF2aW5nIGEgbmVnYXRpdmUgZWZmZWN0IG9uIGhvdXNlIHByaWNlcywgd2hpbGUgZXN0aW1hdGVzIGFwcGVhcmluZyBvbiB0aGUgcmlnaHQgcmVwcmVzZW50IGhhdmluZyBhbiBpbmNyZWFzaW5nIGVmZmVjdCBvbiBob3VzZSBwcmljZXMuIA0KDQoNCiMjIFBvbGsgQ291bnR5IExha2VzDQoNCiFbUG9sayBDb3VudHkgTGFrZXNdKC4uL2ltYWdlcy9wb2xrX2xha2VzLnN2ZykNCg0KUG9sayBDb3VudHkgTGFrZXMgYXJlIHByZXNlbnRlZCBhYm92ZSB2aXN1YWxpemluZyBhdmVyYWdlIE1lcmN1cnksIHBILCBjaGxvcm9waHlsbCwgYW5kIGNhbGNpdW0uIExha2VzIHdpdGggYSBwSCBiZXR3ZWVuIH44IGFuZCA5IGFwcGVhciB0byBoYXZlIGxlc3MgbWVyY3VyeSBhbmQgbW9yZSBjaGxvcm9waHlsbCBwcmVzZW50IHRoYW4gbGFrZXMgd2l0aCBsb3dlciBwSC4gDQoNCmBgYHtyIGVjaG89LCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeShwbG90bHkpDQpsb2FkKCJQb2xrTGFrZXMucmRhIikNCnANCmBgYA0KDQpUaGUgcG9sayBjb3VudHkgbGFrZXMgdGhhdCBoYWQgd2F0ZXIgcXVhbGl0eSBkYXRhIGF2YWlsYWJsZSBhcmUgcHJlc2VudGVkIGFib3ZlLiBMYWtlIFBhcmtlciwgYmVpbmcgdGhlIGxhcmdlc3QgbGFrZSBoZXJlLCBhcHBlYXJzIHRvIGhhdmUgYSBsb3QgbW9yZSBhbGdhZSBwcmVzZW50IHRoYW4gaW4gdGhlIG90aGVyIHByZXNlbnRlZCBsYWtlcy4gDQoNCiMgQ29uY2x1c2lvbnMNCg0KQm90aCBvZiB0aGUgZGF0YXNldHMgdXNlZCBoZXJlIG1ha2UgZm9yIGdyZWF0IGRlbW9uc3RyYXRpb25zIG9mIHZpc3VhbGl6aW5nIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uLCBpbnRlcmFjdGl2ZSBwbG90cywgYW5kIHNwYXRpYWwgdmlzdWFsaXphdGlvbnMuIE90aGVyIGlkZWFzIHRoYXQgd2VyZSBjb25zaWRlcmVkIGluY2x1ZGUgcGVyZm9ybWluZyBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBvbiB0aGUgbGFrZXMgZGF0YXNldCB0byBmaXQgYSBtb2RlbCBwcmVkaWN0aW5nIGNobG9yb3BoeWxsLCBidXQgdGhpcyB3YXMgbGVmdCBvdXQgaW4gZmF2b3Igb2YgcGVyZm9ybWluZyByZWdyZXNzaW9uIG9uIHRoZSBob3VzaW5nIGRhdGFzZXQuIEZ1dHVyZSBpdGVyYXRpb25zIG9mIHRoaXMgd29yayBzaG91bGQgY29uc2lkZXIgcmV2aXNpdGluZyB0aGUgam9pbiBwZXJmb3JtZWQgb24gdGhlIGxha2UgZGF0YXNldHMsIGFzIGl0IGlzIGxpa2VseSB0aGF0IGFuIGVycm9yIG9jY3VycmVkIHJlc3VsdGluZyBpbiB0aGUgcG9zc2liaWxpdHkgb2YgYXR0cmlidXRlcyBiZWluZyBtYXRjaGVkIHRvIGEgbGFrZSB0aGF0IHNoYXJlZCB0aGUgc2FtZSBuYW1lLCBidXQgbWF5IG5vdCBhY3R1YWxseSBiZSBmcm9tIHRoYXQgY291bnR5LiBJZiBhbm90aGVyIGRhdGFzZXQgd2FzIGZvdW5kIHRoYXQgaW5jbHVkZWQgd2F0ZXIgcXVhbGl0eSBtZXRyaWNzIGFuZCB0aGUgYXNzb2NpYXRlZCBjb3VudHkgd2hlcmUgdGhlIGxha2UgaXMgZnJvbSwgYSBtb3JlIGFjY3VyYXRlIGFuYWx5c2lzIGNvdWxkIGJlIHBlcmZvcm1lZC4gDQoNCg0KDQo=